• OpenAccess
    • List of Articles Dataset

      • Open Access Article

        1 - Learning to Rank for the Persian Web Using the Layered Genetic Programming
        Amir Hosein Keyhanipour
        Learning to rank (L2R) has emerged as a promising approach in handling the existing challenges of Web search engines. However, there are major drawbacks with the present learning to rank techniques. Current L2R algorithms do not take into account to the search behavio More
        Learning to rank (L2R) has emerged as a promising approach in handling the existing challenges of Web search engines. However, there are major drawbacks with the present learning to rank techniques. Current L2R algorithms do not take into account to the search behavior of the users embedded in their search sessions’ logs. On the other hand, machine-learning as a data-intensive process requires a large volume of data about users’ queries as well as Web documents. This situation has made the usage of L2R techniques questionable in the real-world applications. Recently, by the use of the click-through data model and based on the generation of click-through features, a novel approach is proposed, named as MGP-Rank. Using the layered genetic-programming model, MGP-Rank has achieved noticeable performance on the ranking of the English Web content. In this study, with respect to the specific characteristics of the Persian language, some suitable scenarios are presented for the generation of the click-through features. In this way, a customized version of the MGP-Rank is proposed of the Persian Web retrieval. The evaluation results of this algorithm on the dotIR dataset, indicate its considerable improvement in comparison with major ranking methods. The improvement of the performance is particularly more noticeable in the top part of the search results lists, which are most frequently visited by the Web users. Manuscript profile
      • Open Access Article

        2 - Use of conditional generative adversarial network to produce synthetic data with the aim of improving the classification of users who publish fake news
        arefeh esmaili Saeed Farzi
        For many years, fake news and messages have been spread in human societies, and today, with the spread of social networks among the people, the possibility of spreading false information has increased more than before. Therefore, detecting fake news and messages has bec More
        For many years, fake news and messages have been spread in human societies, and today, with the spread of social networks among the people, the possibility of spreading false information has increased more than before. Therefore, detecting fake news and messages has become a prominent issue in the research community. It is also important to detect the users who generate this false information and publish it on the network. This paper detects users who publish incorrect information on the Twitter social network in Persian. In this regard, a system has been established based on combining context-user and context-network features with the help of a conditional generative adversarial network (CGAN) for balancing the data set. The system also detects users who publish fake news by modeling the twitter social network into a graph of user interactions and embedding a node to feature vector by Node2vec. Also, by conducting several tests, the proposed system has improved evaluation metrics up to 11%, 13%, 12%, and 12% in precision, recall, F-measure and accuracy respectively, compared to its competitors and has been able to create about 99% precision, in detecting users who publish fake news. Manuscript profile
      • Open Access Article

        3 - Synthesizing an image dataset for text detection and recognition in images
        Fatemeh Alimoradi Farzaneh Rahmani Leila Rabiei Mohammad Khansari Mojtaba Mazoochi
        Text detection in images is one of the most important sources for image recognition. Although many researches have been conducted on text detection and recognition and end-to-end models (models that provide detection and recognition in a single model) based on deep lear More
        Text detection in images is one of the most important sources for image recognition. Although many researches have been conducted on text detection and recognition and end-to-end models (models that provide detection and recognition in a single model) based on deep learning for languages such as English and Chinese, the main obstacle for developing such models for Persian language is the lack of a large training data set. In this paper, we design and build required tools for synthesizing a data set of scene text images with parameters such as color, size, font, and text rotation for Persian. These tools are used to generate a large still varied data set for training deep learning models. Due to considerations in synthesizing tools and resulted variety of texts, models do not depend on synthesis parameters and can be generalized. 7603 scene text images and 39660 cropped word images are synthesized as sample data set. The advantage of our method over real images is to synthesize any arbitrary number of images, without the need for manual annotations. As far as we know, this is the first open-source and large data set of scene text images for Persian language. Manuscript profile
      • Open Access Article

        4 - Survey on the Applications of the Graph Theory in the Information Retrieval
        Maryam Piroozmand Amir Hosein Keyhanipour Ali Moeini
        Due to its power in modeling complex relations between entities, graph theory has been widely used in dealing with real-world problems. On the other hand, information retrieval has emerged as one of the major problems in the area of algorithms and computation. As graph- More
        Due to its power in modeling complex relations between entities, graph theory has been widely used in dealing with real-world problems. On the other hand, information retrieval has emerged as one of the major problems in the area of algorithms and computation. As graph-based information retrieval algorithms have shown to be efficient and effective, this paper aims to provide an analytical review of these algorithms and propose a categorization of them. Briefly speaking, graph-based information retrieval algorithms might be divided into three major classes: the first category includes those algorithms which use a graph representation of the corresponding dataset within the information retrieval process. The second category contains semantic retrieval algorithms which utilize the graph theory. The third category is associated with the application of the graph theory in the learning to rank problem. The set of reviewed research works is analyzed based on both the frequency as well as the publication time. As an interesting finding of this review is that the third category is a relatively hot research topic in which a limited number of recent research works are conducted. Manuscript profile
      • Open Access Article

        5 - Noor Analysis: A Benchmark Dataset for Evaluating Morphological Analysis Engines
        Huda Al-Shohayyeb Behrooz Minaei Mohammad Ebrahim Shenassa Sayyed Ali Hossayni
        The Arabic language has a very rich and complex morphology, which is very useful for the analysis of the Arabic language, especially in traditional Arabic texts such as historical and religious texts, and helps in understanding the meaning of the texts. In the morpholog More
        The Arabic language has a very rich and complex morphology, which is very useful for the analysis of the Arabic language, especially in traditional Arabic texts such as historical and religious texts, and helps in understanding the meaning of the texts. In the morphological data set, the variety of labels and the number of data samples helps to evaluate the morphological methods, in this research, the morphological dataset that we present includes about 22, 3690 words from the book of Sharia alـIslam, which have been labeled by experts, and this dataset is the largest in terms of volume and The variety of labels is superior to other data provided for Arabic morphological analysis. To evaluate the data, we applied the Farasa system to the texts and we report the annotation quality through four evaluation on the Farasa system. Manuscript profile
      • Open Access Article

        6 - Survey on the Applications of the Graph Theory in the Information Retrieval
        Maryam Piroozmand Amir Hosein Keyhanipour Ali Moeini
        Due to its power in modeling complex relations between entities, graph theory has been widely used in dealing with real-world problems. On the other hand, information retrieval has emerged as one of the major problems in the area of algorithms and computation. As graph- More
        Due to its power in modeling complex relations between entities, graph theory has been widely used in dealing with real-world problems. On the other hand, information retrieval has emerged as one of the major problems in the area of algorithms and computation. As graph-based information retrieval algorithms have shown to be efficient and effective, this paper aims to provide an analytical review of these algorithms and propose a categorization of them. Briefly speaking, graph-based information retrieval algorithms might be divided into three major classes: the first category includes those algorithms which use a graph representation of the corresponding dataset within the information retrieval process. The second category contains semantic retrieval algorithms which utilize the graph theory. The third category is associated with the application of the graph theory in the learning to rank problem. The set of reviewed research works is analyzed based on both the frequency as well as the publication time. As an interesting finding of this review is that the third category is a relatively hot research topic in which a limited number of recent research works are conducted. Manuscript profile